Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

نویسندگان

  • Lawrence Muchemi
  • Gregory Grefenstette
چکیده

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide Web, DMOZ 1 to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook, Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that the induced taxonomies are of high quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis and Validation of Information Access Through Mono, Multidimensional and Dynamic Taxonomies

Access to complex information bases through multidimensional, dynamic taxonomies (also improperly known as faceted classifications) is becoming a hot concept both in research and in industry. In this paper, the major shortcomings of conventional, monodimensional taxonomic approaches, such as the independence of different branches of the taxonomy and insufficient scalability, are discussed. The ...

متن کامل

Extended Faceted Taxonomies for Web Catalogs

Indexing and retrieval in Web catalogs can benefit from using faceted taxonomies. A faceted taxonomy consists of a set of facets, where each facet consists of a predefined set of terms structured by a subsumption relation. We propose two extensions of faceted taxonomies, which allow inferring conjunctions of terms that are valid in the underlying domain. We give a model-theoretic interpretation...

متن کامل

Exploratory Access to Wikipedia through Faceted Dynamic Taxonomies

Users currently access Wikipedia through two traditional paradigms, text search and hypertext navigation. We believe that user access can be significantly improved by supporting a systematic conceptual exploration of the knowledge base through dynamic taxonomies with a faceted taxonomy organization. This approach allows the easy manipulation of sets of documents and the systematic and intuitive...

متن کامل

An Algebra for Specifying Compound Terms in Faceted Taxonomies

A faceted taxonomy is a set of taxonomies, each describing a given domain from a different aspect, or facet. The indexing of domain objects is done through conjunctive combinations of terms from the facets, called compound terms. A faceted taxonomy has several advantages over a single hierarchy of terms, including conceptual clarity, compactness and scalability. A drawback, however, is the cost...

متن کامل

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

A faceted taxonomy is a set of taxonomies, each describing a given domain from a different aspect, or facet. The indexing of domain objects is done through conjunctive combinations of terms from the facets, called compound terms. A faceted taxonomy has several advantages over a single hierarchy of terms, including conceptual clarity, compactness and scalability. A drawback, however, is the cost...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016